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Abstract 

We consider univariate regression estimation from an individual (non-random) 
sequence (xx,yi), (X2, IJ2), ■ ■ ■ E IRx IR, which is stable in the sense that for each 
interval A C IR, (i) the limiting relative frequency of A under x±,X2, ■ ■ ■ is gov- 
erned by an unknown probability distribution /i, and (ii) the limiting average 
of those yi with X{ £ A is governed by an unknown regression function m(-). 

A computationally simple scheme for estimating m(-) is exhibited, and is 
shown to be L2 consistent for stable sequences {(xi,yi)} such that {yi} is 
bounded and there is a known upper bound for the variation of m(-) on inter- 
vals of the form (— i, i], i > 1. Complementing this positive result, it is shown 
that there is no consistent estimation scheme for the family of stable sequences 
whose regression functions have finite variation, even under the restriction that 
Xi € [0, 1] and yi is binary- valued. 



Key words and phrases: nonparametric estimation, regression estimation, individ- 
ual sequences, ergodic time series. 



1 Introduction 



Individual numerical sequences (binary and real-valued) have played an important 
role in the theory of data compression and computational complexity. The theory 
of lossless data compression developed by Ziv and Lempel [12], Ziv [24], and the 
complexity theory of Kolmogorov [8, 9] and Chaitin [3] are both formulated within a 
purely deterministic framework that is built around individual sequences. Subsequent 
work in these areas has considered useful notions of randomness, compressibility, and 
predictability. More recently, individual sequences have been studied in the context 
of statistical learning theory. In spite of the above research, there has been little 
consideration of individual sequences in the context of classical statistical estimation. 

It is common in statistics to treat data, for the purposes of analysis, as a sequence 
of (typically independent) identically distributed random variables. This stochastic 
point of view collapses when one is faced with a particular collection of data, which is 
a fixed sequence of numbers or vectors from which we hope to learn something about 
the state of nature. 

It is natural then to (re)formulate some classical statistical problems in terms 
of individual sequences. We concern ourselves here with the important problem of 
regression estimation. In the common statistical setting one is given n independent 
replicates (X 1: Yi), . . . , (X„, Y n ) of a jointly distributed pair (X, Y) e 1R x 1R, and 
asked to find an estimate of the regression function m(x) = E[Y\X = x\. Justification 
for estimation of m(x) comes from the fact that it minimizes E(h(X) — Y) 2 over all 
functions h(-) of X. Thus m(-) is the least squares estimate of Y given X. 

In this paper we present and analyze a simple regression estimation procedure 
that is applicable in a purely deterministic setting. By applying our estimates to 
individual sample paths, we easily establish their almost-sure consistency for ergodic 
processes having suitable one- dimensional distributions (the dependence structure of 
the process is unimportant). The approach and results of this paper are motivated 
by, and closely related to, recent results of [17] on density estimation from individual 
sequences. 

For independent and weakly dependent stochastic data, a variety of estimation 
schemes have been proposed, including procedures based on histograms, kernels, neu- 
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ral networks, orthogonal series, wavelets, and nearest neighbors. For a description of 
some of these methods see, for example, Gyorfi, Hardle, Sarda, and Vieu [6], Rous- 
sas [20], Devroye, Krzyzak [5] and the references therein. Kulkarni and Posner [11] 
studied nearest neighbor regression estimates in the case where xi,X2,-- - are de- 
terministic, but Y"i, Y 2 , . . . are random and conditionally independent given the x^s. 
Yakowitz et al. [23] considered a family of truncated histogram regression estimates 
for processes with vector-valued covariates. For each constant L > they exhibit 
a sequence of estimates that is almost surely pointwise consistent for every ergodic 
process {(^Q, Yj)} whose regression function satisfies a Lipschitz condition of the form 
\m(x) — m(y)\ < L\\x — y\\. In practice, the constant L is known and fixed in advance 
of the data. Related work has been done in the area of nonparametric forecasting 
for a stationary process Xj. Cover [4] posed some natural questions which have been 
addressed by Bailey [2], Ryabko [21], and Ornstein [18], and more recently by Algoet 
[1], Morvai, Yakowitz, Gyorfi [15] and Morvai, Yakowitz, Algoet [14]. Nobel [16] has 
shown that no regression procedure is consistent for every bivariate ergodic process, 
even if one assumes that X { is bounded and Yj is binary valued. A similar negative 
result for individual sequences is established in Theorem 2 below. 

In order to study regression estimation in a deterministic setting one must first 
specify how an individual sequence (xi, y{), (x 2 , 1/2), • • • can contain information about 
a regression function. In the present paper, following [17], it is required that suitable 
averages over the sequence are convergent or 'stable'. The deterministic setting of this 
paper is also in line with other recent work on individual sequences in information 
theory, statistics, and learning theory (cf. [24, 13, 7]). The principal contribution of 
the paper is to show how one may extract asymptotic information from the sequence 
in the absence of probabilistic inequalities, mixing conditions, rates of convergence, 
and so on. The deterministic setting is described in Section 2 and the principal 
results of the paper are stated in Section 3. Proofs of the principal results are given 
in Sections 4 and 5. 
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2 The Deterministic Setting 



Let \i be a probability distribution on (IR, £>), and let m : IR — > IR be a function 
satisfying / \m(x)\ii(dx) < oo. Let x = (xi,x 2 , . . .) and y = (2/1,2/2, • • •) be infinite 
sequences of real numbers. For each interval A C IR define the signed measure 

v(A) = \ m(x)fu,(dx) . 

J A 

For each n > 1 define the relative frequency 

1 n 

£ n (A) = -^/{x, G A}, 

n i=i 

and the joint sample average 

1 " 

z> n (A) = -Y,ViI{xi e A}. 

n 1=1 

The sequence x will be said to have limiting distribution //(•) if 

p> n (— 00, i] —>//(— 00, i] and /i n ({t}) — > for every t e H, (1) 

and the pair (x, y) will be said to have limiting regression m(-) if 

n (— 00, t] — > z/(— 00, t] and ^ n ({0) — ^ ^({0) f° r ever y t G H- (2) 

(Note that the second condition is superfluous in each case if fi is non-atomic.) By 
minor modification of a standard proof of the Glivenko Cantelli Theorem (such as 
that in Pollard [19]), one may show that if x has limiting distribution //(•) then in 
fact 

sup \ jln(A) - n{A)\ — > , (3) 
AeA 

where A is the collection of all intervals of the form (a, b] and (—00, b] with a, 6 G IR. 

An individual sequence (x, y) satisfying (1) and (2) will be called stable. Let 
f2(/z, m) denote the set of stable sequences with limiting distribution /i and limiting 
regression m. Stability concerns only the asymptotic behavior of fi n and z> n , which 
need not converge to their respective limits at any particular rate. No constraints are 
place on the mechanism by which the individual sequences (x, y) are produced. Note 
in particular that membership of (x, y) in Q(/x, m) is unaffected if one adds to x and y 
finite prefixes x[, . . . , x' k and y[, . . . , y' k having the same length. The next proposition, 
showing that the sample paths of ergodic processes are stable with probability one, 
follows easily from Birkhoff 's ergodic theorem. 
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Proposition 1 Let (Xi, Yi), (X 2 , Y 2 ), . . . be stationary such that E\Y\ < oo. Then 
{(Xi,Yi)} is stable with probability one. If, in addition, {(JQ,Y^)} is ergodic then 
(X, Y) G ft(ii,m) with probability one, where fJ>(A) = P(X G A), m(x) = E(Y\X = 
x),X = (X 1 ,X 2 ,...) andY=(Y 1 ,Y 2 ,...). 

Proof: Let 8 denote the invariant cr-algebra. By Birkoff's pointwise ergodic theorem 
(cf. Stout [22] Theorem 3.5.6 p. 176), for arbitrary Borel-measurable set A C IR with 
probability one, 

fi n {A) - P(X 1 G A\£) =: n g {A) 

and 

v n (A) — > E(YiI{ Xl eA}\£) =:v£(A). 
If, in addition, {(Xi, Yi)} is ergodic then £ is the trivial a-algebra and so 

jl n {A) - P{X 1 G A) 

and 

u n (A) -> E(Y 1 I {XieA} ). 

The rest follows from the standard proof of the Glivenko Cantelli Theorem (cf. Pollard 
[19]). □ 

Remark 1. Note that for individual sequences, 

(i n (— oo,t] — > fj,(—oo,t] for alH G IR 
does not necessarily imply 

£«({*}) -»■ A*({0) for a11 * e IR. 

Indeed, with x = (^-,^-,...), /t„(— oo,t] = 1 for £ > 0, while /t n (— oo, t] — * for 
£ < 0. Thus the limiting distribution /i should concentrate on the atom {0}, but 
/} n ({0}) = for all n. 

3 Statement of Principal Results 

Recall that the total variation of a real- valued function h defined on an interval (a, b] 
is given by 

n 

V(h : a,b) = sup J2\ h (ti) -M*<-i)l, 

i=l 
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where the supremum is taken over all finite ordered sequences 

a < t < ti < ■ ■ ■ < t n _i < t n — b . 

Let IN denote the positive intergers. For each non-decreasing function a : INT — > (0, oo), 
let J-(a) denote the set of bounded measurable functions m : IR — > IR such that 
V(m : < a(i) for all i > 1. Let 7r = {IR}, and for each > 1 let 7r fc be the 

partition of IR consisting of the dyadic intervals 

'O'-i) j 



OO < J < oo . 



^ 2 k ' 2 fc 

Let 7Tfc[a;] denote the unique cell of n k containing x G IR. Note that 7r fc+1 refines it k , 
and that for each x, 

lim len(7Tfc[a;]) = , 

fc^oo 

where len(A) denotes the length of an interval A. 

Let m G J- (a) be arbitrary. Let \i denote an arbitrary probability distribution on 
IR. Fix two numerical sequences x and y such that (x, y) G Q(/x, m). For each k > 1 
we define a histogram regression estimate based on Tik and adaptively chosen initial 
sequences of x and y. For each n > 1, k > define 

where by convention 0/0 = 0. Note that rhk, n is piecewise constant on the cells of 7r fc . 
Let r = 1 and for each A; > 1 define 

r k = min{n > r fe _x : V{m^ n : — < Aa(i) for all 1 < i < k} . 

By Lemma 1, r k is well defined and finite. Note that r k — > oo. Define the estimate 

^fe = rhk,r k ■ 

Note that m fc depends only on the pairs (x±, yi), . . . , (x Tk , y Tk ). To create a fixed 
sample size version of the estimate for n > 1 let 

K n = max{/c > : r k < n} 

and define 

m n = rh Kn . 

The L 2 (/i)-consistency of the estimates is established in the following theorem. 
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Theorem 1 Let a : IN — > (0, oo) be a known, non- decreasing function. For every 
m(-) G ^{o), every probability distribution fi on IR ; and every stable pair (x, y) G 
m ) such that the components of y are bounded, 

j (rhk(x) — m(x)) 2 fi(dx) — > and J (m n (x) — m(x)) 2 fi(dx) — > . 

In other words, the estimates rh n and rut are L 2 (fi)- consistent. 

Remark 2. Definition of rh k is based solely on a(-) and the given numerical se- 
quences. In advance of the data, one need only know a bound on the variation of 
its limiting regression on the intervals (— The limiting distribution /i, the pre- 
asymptotic behavior of the individual sequences, and the bound on the yi need not 
be known in advance. 

Remark 3. Let B(M) denote the class of monotone bounded functions m : 1R — > 1R 
such that \m(x)\ < M for all x G 1R. Since B(M) C F(a) with a{n) = 2M Theorem 1 
is applicable to B(M). 

Remark 4. Let A(C) denote the class of Lipschitz continuous functions m : 1R — * 1R 
such that \m(x) — m(z)\ < C\z — x\ for all x,z G IR. Since L(M) C with 
a(n) = 2Cn + e where < e < oo is arbitrary, Theorem 1 is applicable to A(C). 

Theorem 1 and Poposition 1 imply the next corollary. 

Corollary 1 Let a : INT — > (0, oo) be a known, non- decreasing function. For every 
stationary ergodic process (X ± , Yi), (X 2 , Y 2 ), ...6 MxR such that has distribution 
ji, Y is bounded with probability one, and m(x) = E(Yi\Xi = x) G F^a), 

j (rhk(x) — m(x)) 2 fi(dx) — > and J (m n (x) — m(x)) 2 fi(dx) — > 

with probability one. 

Theorem 1 and Proposition 1 imply even more. We apply the same notations as in 
the proof of Proposition 1. 

Corollary 2 Let a : N — > (0, oo) be a known, non- decreasing function. 

Let (Xl, Yi), (X 2 , Y2), ... 6 Ix E a stationary process such that Y is bounded 



6 



dv n 

with probability one. Let mg := j> that is, vg{A) = f A mg(x)[ig(dx) . Assume 
that mg(-) G J 7 (a) with probability one. Then 

J (rhk(x) — mg(x)) 2 fig(dx) — > and J (rh n (x) — mg(x)) 2 fig(dx) — > 

probability one. 

The conditions in Theorem 1 cannot be significantly weakened. 

Theorem 2 Let A denote the uniform distribution on [0,1]. T/iere is no L 2 (A) 
consistent regression procedure for the family of stable sequences (x, y) such that 
Xi G [0, 1] has limiting distribution \, and yi G {0, 1} has limiting regression m 
with V{m : 0, 1) < oo. 

4 Proof of Theorem 1 

Lemma 1 Let a : N — > (0, oo) be a known, non- decreasing function. For every 
m(-) G ^"(a), every probability distribution fi on JR, every stable pair (x, y) G Q(fi,m), 
and for all k>0,Tk is well defined and finite. 

Proof: By definition r = 1. Hence we may assume k > 1. Let / be any function 
with bounded variation V(f : < oo on (— Define 

(/ o 7r fc )(x) = \ / f(z)n(dz). 

fJ,{n k [x\) Jir k [x\ 

Note that / o 7r is piecewise constant on the cells of 7r. 

For / non-decreasing it is immediate that V(f o n k : < V(f : If 

/ is not necessarily non- decreasing then f(x) = u(x) — v(x) where u(-) and v(-) 
are non-decreasing, V(u : < V(f : and V(i; : < 2V(f : (cf. 

Kolmogorov and Fomin [10]). It follows from the definition that /o7Tfe = uon k — von kl 
and since u and f are non- decreasing, so are u o n k and v o n k . Therefore 

V(f o ix k : = V(uoix k - v on k : -i,i) 

< V{u o n k : — i, i) + V(t> o 7Tfe : — i, i) 

< V(u : + V(v : 

< 3y(/:-i,i) 
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as the variation of the sum is less than the sum of the variations. Now note that since 
V(m : —i, i) < a(i) hence as n — > oo 

V(m t , n :-i,i) = e' |M^Ml-M^±ll| 

j= -£k +1 K A k,j) K A kd+i) 
= V{m o 7Tjt : —i, i) 

< 3V(m : < 4a(i). 

Thus r k is well defined and finite. □ 

Proof of Theorem 1: Fix a sequence (x, y) satisfying the conditions of the 
theorem. For each k > 1 define g k (x) = rh k (x) — m(x). It follows from the definition 
of r k and the assumption that m(-) G J- {a) that 

V(g k : < V{fh k : + V(m : < 5a(i) for 1 < i < k. 

Let D/2 > 1 be a common bound for m(-) and the elements of y, so that |<7fc(x)| < D 
for each x. 

Let U = {ui,u 2 , . . .} be those numbers u for which /jl({u}) > 0. Then U is either 
finite or countably infinite. Note that \i may be decomposed as a sum /i d + fi c , where 
Hd is a purely atomic measure supported on U, and /i c is non- atomic. Fix e G (0, 1). 
Let T > 1 be an integer such that 

M{* : |x| > T}) < ^ (4) 

and let J > 1 be so large that 

\u\ 



E MM) < 7^, (5) 
j=j+i 



where |C/| denotes the cardinality of [/. For k > 1 define 

A(fc) = min{/i c (A) : A G 7T fc , A C (— T, T], /i c (A) > 0} 

and 

G(k) = max{/i c (A) : A G 7T fc , A C (— T, T]}. 
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Note that Q(k) > A(k) > for each k and that Q(k) is a non-increasing function of 
k. Let 

0* = lim e(fc) . 

Suppose that 6* > 0. Then there is a sequence of intervals A k e 7r fe such that 
^c(A k ) > 0* and clos(A fc+1 ) C clos(A fc ) for each k > 1, where clos(A) denotes the 
closure of A. As len(A fe ) — > 0, n& clos(A fc ) is a singleton {rc }. Continuity of \i c implies 
that /i c ({^o}) > 0* > 0, which contradicts the fact that \x c is non-atomic. Therefore 
0* = 0. Let K > 1 be so large that 

Fix an atom u £ U. If r < k then 

MM) + -°MM) < v Tk {K k (u)) + Dfi Tk (n k (u)) < u Tk {n r {u)) + Dli Tk (n r {u)) 
KMu)) - fi Tk (ir k (u)) - MM) 

As k tends to infinity, stability implies that 

M^r(u)) -> M^rK)), Z> Tfc (7T r (M)) -> I/(7T r (w)), MM) -> ^(M)- 

As r tends to infinity continuity of the measures ji and v implies that 

/i(7T r (w)) -> A*(M)> ^(^rH) -»• KM)- 

From these relations we conclude that 

lim M^jfcHj = ^jMj ( 7 ) 
fc-oo/i Tfc (7r fc (w)) MM)' 

By (3), (7) and (2) there exists K' > max(K,T) such that for all indices k > K', 

sup\fL Tk (A)-v(A)\<^-A(K), (8) 

\g k (ui)\ 2 < j for i = 1, . . ., J, (9) 

and 

| / mfcrf/i^ - / md//| < -A(lT) (10) 
J A 4 

for every cell A e n K with A C (— T, T]. 
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Fix k > K', and let A G ir K be such that /i(A) > and A C (-T, T]. Inequalities 
(8) and (10) imply that 

| / g k (x)fi(dx)\ < | / m k dfi- / m k dfi Tk \ + | / m k dp, Tk - / mdfi\ 

JA J A J A J A J A 

< D sup \jl Tk (A') ~ + |A(X) 



A'e.4 



and therefore 



Consider those points 



2 

lA9k(x)n(dx) 



pt(A) 



< -. (11) 
- 2 v ; 



^ = {^R: \9k(x)\ > e} 
for which g k exceeds e, and define 

n k = {Aen K : AC) H k ^ ®, A C (-T,T], fi(A) > 0}. 

If A G H k then there exists x G A such that |<?fc(x)| > e. Assume without loss of 
generality that g k {x) > e. By virtue of (11) there exists z G A such that g k (z) < e/2, 
and therefore, |<7fc(x) — <?fc(z)| > e/2 for some x, z G A. Consequently 

e 

2 1 

from which follows that 



\H k \<V(g k :-T,T)<ba(T) 



< M (12) 

Consider now the L 2 (/j) error of m^. From the definition of 7i k and inequalities 
(12), (6), (5), (9), and (4) it follows that 



/ \gh{x)\ 2 n{dx) < D 2 dfi c + / \g k (x)\ 2 dfi d (x) 

J A€H k JA Aen k JA 

+ Y { e 2 da + / D 2 da 

T^, rnrm J A J\X\>T 



A0i k ,AQ(—T,T\ u ^ = 
J \U\ 

< e + Y\9k(ui)\ 2 + J2 D 2 /2 d ({u t }) + e 2 + e 

i=l i=J+l 

< 4e + e 2 . 

Letting k — > oo and e — > shows that / |gfc(x)| 2 /x((i:c) - ► 0. Since « n /* oo, the L 2 (//) 
convergence of rh n to m is immediate. □ 
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5 Proof of Theorem 2 



Proof of Theorem 2: For k > 1 define the fc'th Rademacher function as 
h k (x) = 

and let 



1 if 2j2- k <x< (2j + l)2- k for some < j < 2 k ~ x 
otherwise , 



h (x) = < 



0.5 if x G [0,1] 
otherwise. 



Define JF = {ho, hi, h 2 , ■ ■ ■} and let T\ = {hi, h2, ■ ■ ■}■ Let A denote the uniform 
distribution on [0, 1]. We will prove even more than stated in Theorem 2, namely: 

There is no L 2 (X) consistent regression estimation procedure for the family 
n*= |J Q(A,m)n{(x,y) :x n e [0,l],y„G {0,1} for all n> 1}. 



This statement says that even for the countable class JF of regression functions 
there is no L 2 (X) consistent estimation procedure. We briefly describe the main idea of 
the proof. Let $ = 2 , • • •} be any regression estimation procedure. If $ fails to be 
consistent for some sequence (x, y) G Ume/i m) with Xi G [0, 1] and yi G {0, 1}, 
there is nothing to prove. Assuming then that $ is consistent for every such sequence, 
we construct a stable sequence (x*,y*) such that n (- : (x\,y{), . . . , (x*, y*)) fails to 
converge. The sequence (x*, y*) has limiting distribution A and limiting regression h . 
It is constructed by 'splicing' together longer and longer blocks of stable sequences 
(x( fe ), y^) G ^(/ife, A). When applied to the resulting sequence, the procedure $ first 
produces estimates close to hi, as the sample size is increased $ produces estimates 
close to hi, then /i 3 , and so on. As the h^s fail to converge, so to do the estimates 

M- ■ (x*i,y*i),---,(x* n ,y*)), n > l. 



Note that each hj is supported on [0,1] and that f \hj(x) — hk(x)\ 2 \(dx) = 0.5 
whenever j ^ k, and j > 1, k > 1. Let 

u k (A) = / h k (x)X(dx) 

J A 
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and for each finite sequence (x±, y±), . . . , (x m , y m ) G [0, f] x {0,1} let 



A(xi, ...,x m ) = sup 
AeA 



^I{ Xj eA}-X{A) 



sup \p, m (A) - X(A)\ 

AeA 



and 



A k ((x 1 ,y 1 ),...,(x m ,y m )) = sup 

AeA 



= sup 

AeA 



5>jJ{zj G A}-i/ fc (A) 



m J=l 



j m 



= sup \u m {A) - u k (A)\ 
AeA 

where A is the collection of all intervals of the form (a, b] and (— oo, b] with a, b G 1R. 

A minor modification of a standard proof of the Glivenko Cantelli Theorem (e.g. 
using the bracketing approach found in Pollard [19]) shows that 



A(x 1 ,...,x m ) -> and A fe ((xi, yi), . . . , (x m , y m )) -> 



(13) 



for all (x, y) G 0(A, h k ) n {(x, y) : x n G (0, 1), y„ G {0, 1} for all n > 1}. 

Suppose now that $ = 2 , • • •} is consistent for JF X . For each k > 1 select a 
sequence 

(x(*),yW) = ((xf>,y?>),(xW,^>),...) 

such that 



x « y W) e fi(/i fc , A) n {(x,y) : x n G (0, l),y n G {0, 1} for all n > 1} 



and 



= if and only if % — j, k = I 



(14) 



(e.g. typical sample sequences from independent i.i.d. time series 

where has distribution A cf. Proposition 1). Define 

l k = min \l : sup A(x[ k \ . . .,x£>) < -^—) 

{ m>L k + 1 J 



l k = min\L: sup A fc ((s<* \ y? >), . . . , (tf.tf)) < , , 



1 



(15) 
(16) 
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By (13), both l k and l k are finite. Consider the infinite sequence (x^y^ 1 )). As 
hi & J 7 !, and $ is consistent for T\ by assumption, 

Km / \M? ■■ • • • , V?')) - h x {x)\ 2 \{dx) = 0. 

Therefore there is an integer n\ > max(/ 2 , h) and a corresponding initial segment 
(vW wW) = • • • , (4?, vff)) ^ (x (1) ,y (1) ) such that 

and 

A(v«) < 1 
v > - 2 

and 

AiKvWwW))^. 

Let n = and let ni be as defined above. Now suppose that for all 1 < j < k one 
has constructed sequences (y^\ w^) of finite length rij in such a way that 

(17) 

/ |0 n > : (v«w W )) " ^(x)| 2 AW < 1 , (18) 
A^^a + l)" 1 (19) 

nj>j-max(Zj+i,r j+ i). (21) 
As (v( fe ), w( fe )) is finite, the concatenation 

(r,f%( fc) ), . . . , (r,w,«,W), (4 fe+1) ,yf +1) ), (4 fc+1) ,^ +1) ), • • • 

is contained in f2(/i fc+1 ,A). It follows from the consistency of $ that for all large 
enough n 

(v[ k \ w{% . . . , („(£>, <)), ( 4 fc+1) , y! fc+1) ), (4 fc+1) , • • • , (xJS2, y£S) 

satisfies (17), (18), (19) and (20) with j replaced by k + 1. Select > so large 
that the same is true of (21). 
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As (v( fc+1 \ w^' +1 )) is an extension of (v^ k \w^), repeating the above process 
indefinitely yields an infinite sequence (x*, y*). By construction, the functions <f> n (-) = 
(f>(- : (xl,yl), . . . , do not converge in L 2 (X). Indeed, it follows from (18) and 

from the inequality a 2 > d 2 /5 — b 2 — c 2 whenever (a + b + c) 2 = d 2 that 

J '\(f>n k {x) ~ <f> ni (x)\ 2 \(dx) > ^J\<f) k {x)-Mx)\ 2 Hdx) 



\(f>k(x) - (j)n k (x)\ 2 \(dx) 

J \<p ni (x)-U*)\ 2 Kdx) 



1 1 1 

~ 10 ~ 40 ~~ 40 
1 

> — 
- 20 

whenever k ^ I, k > 1, I > 1. 

It remains to show that the limiting distribution of x* is A and the limiting re- 
gression of (x*,y*) is h . To this end, fix k > 1 and let A C [0, 1] be an arbitrary 
interval. It is easily verified that 

W k (A)-u (A)\<2- k+1 <^. (22) 

Let fi n (A) and u n (A) be evaluated on ((x*,yjf), . . . , (x*,y*)), and for each 1 < r < 
rik+i — n k define 

i rik+r 



T 

j=n k +l 



The equation 
implies the bound 



\o m+r (A) -MA) | < ■ \»,JA) - ma)\ + KM) 

7l k ~r ' i^k * ' 

± I + II. 

By virtue of (20) and (22) 

/ < \"nM) ~ v k {A)\ + \uo(A) - v k {A)\ < ^ + |. 

If n fe+ i - n k >r> l k+1 then by (16) 

A fc+ i(« +1 ,24 +1 ), ■ ■ ■ > K+t.iCh-)) = A fe+1 ((4 fc+1) ,^ +1) ), . . . , (4 fc+1) ,^ fe+1) )) 

1 

< 

- k + 2 

14 



and therefore 

// < \K,k(A) - V(k+i)(A)\ + \u (k+1) (A)-u (A)\ < ^ + 7^7 

On the other hand, if < r < l k+ i then (21) implies that 

2r 2r 2 

II < < 



nfc + r kr + r k + 1 
These bounds ensure that, since A was an arbitrary interval, 

6 



max < sup \ v n (A) - v (A)\ : n k < n < n k+1 \ < 



AeA 



k 



and consequently, 



lim sup \i>n(A) — z/ (A)| = 0. 



A similar (in fact, easier) analysis establishes 

lim sup \ ft n (A) - X(A) \ = 0. 

n ^°° A&A 

Finally, by (14), for all t G 1R 

M{t}) -> A({0) = and !>„({*}) -> ^o(W) = 0. □ 
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